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ABSTRACT 



The maximum likelihood estimation (MLE) method, typically used for polytomous 
logistic regression, is prone to bias due to both misclassincation in outcome and con- 
tamination in the design matrix. Hence, robust estimators are needed. In this study, 
we propose such a method for nominal response data with continuous covariates. 
A generalized method of weighted moments (GMWM) approach is developed for 
dealing with contaminated polytomous response data. In this approach, distances 
are calculated based on individual sample moments. And Huber weights are ap- 
plied to those observations with large distances. Mellow-type weights are also used 
to downplay leverage points. We describe theoretical properties of the proposed 
approach. Simulations suggest that the GMWM performs very well in correcting 
contamination-caused biases. An empirical application of the GMWM estimator on 
data from a survey demonstrates its usefulness. 
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INTRODUCTION 

Polytomous logistic regression models for multinomial data are a powerful technique for 
relating dependent categorical responses to both categorical and continuous explanatory 
covariates (McCullagh & Nelder, 1989; Liu & Agresti, 2005). In practice, however, the 
model building process can be highly influenced by peculiarities in the data. The 
maximum likelihood estimation (MLE) method, typically used for the polytomous logistic 
regression model (PLRM), is prone to bias due to both misclassincation in outcome 
and contamination in the design matrix (Pregibon, 1982; Copas, 1988). Hence, robust 
estimators are needed. 

For categorical covariates, we may apply MGP estimator (Victoria-Feser & Ronchetti, 
1997), </>- divergence estimator (Gupta et ah, 2006), and robust quadratic distance 
estimator (Flores, 2001 ) . The least quartile dfference estimator can deal with overdispersion 
problem (Mebane & Sekhon, 2004). But all these methods are difficult to adapt for 
continuous covariates. 
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A generalized method of moments (GMM) estimation can be formed as a substitute 
of MLE. The GMM is particularly useful when the moment conditions are relatively easy 
to obtain. GMM has been extensively studied in econometrics (Hansen, 1982; Newey & 
West, 1987; Pakes & Pollard, 1989; Hansen, Heaton & Yaron, 1996; Newey & McFadden, 
1994). Under some regularity conditions, the GMM estimator is consistent (Hansen, 
1982). With an appropriately chosen weight matrix, GMM achieves the same efficiency 
as the MLE (Hayashi, 2000). Furthermore, under certain circumstances, GMM provides 
more flexibility, such as dealing with endogeneity through instrumental variables (Baum, 
Schaffer & Stillman, 2002). 

Like MLE, GMM estimation can be easily corrupted by aberrant observations (Ronchetti 
& Trojani, 2001). Such observations can bring up disastrous bias on standard parameter 
estimates if they are not properly accounted for, see Huber (1981 ), Hampel et al. (2005), 
and Rousseeuw & Leroy (2003). So we propose a modified estimation method based on an 
outlier robust variant of GMM. The method is different from the kernel-weighted GMM 
developed for linear time-series data by Kuersteiner (2012) in that this is a data-driven 
method for defining weights. The new approach is evaluated using asymptotic theory, 
simulations, and an empirical example. 

The robust GMM estimator is motivated by the data from a 2006 study on hypertension 
in a sample of the Chinese population. 520 people completed the survey. Observed 
variables included demographics, social-economic status, weight, height, blood pres- 
sure, and food consumption. Sodium intakes were calculated based on overall food 
consumption. Among those covariates, age, body mass index (BMI), and sodium intakes 
are all continuous. Based on blood pressure measurements, subjects were classified 
into 4 categories: Normal, Pre-hypertension, Stage 1 and Stage 2 hypertension. Table 1 
lists the summary statistics of the sample. One of the research objectives is to examine 
the association between hypertension and risk factors in the population. Since the 
proportional odds assumption is violated (Score test for the proportional odds assumption 
gives x 2 = 182.27 with a degree of freedom of 8, p < 0.0001), we apply the polytomous 
logistic model, using the normal category as the reference level. In the case of / category, the 
polytomous logit model have / — 1 comparisons. Each comparison have a set of parameters 
for all covariates in the model. Therefore, the generalized logit model is not parsimonious 
when comparing with the proportional odds model. But the simultaneous estimation of all 
parameters is more efficient than separate models for each comparison. It is another option 
for ordinal response data, especially when a proportional odds model does not fit the data 
well. Table 2 lists the output from the model estimated by MLE. It is obvious that, if MLE is 
used, the estimates is inconsistent for sodium intakes, particularly the negative coefficient 
of sodium intake for the odds between the Stage 2 hypertension and the Normal categories. 
The inconsistency is more obvious when we plot the odds with respect to the sodium 
intake, the downward trend of the odds in Fig. 2A. This result contradicts the previous 
finding that there is a strong relationship between sodium intake and hypertension, see 
for example National Research Council (2005), He & MacGregor (2004) and references 
therein. Besides, Fig. 2A also shows another strange situation: the higher starting points 
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Figure 1 Scatter plot of distance vs. leverage, which are based on MLE. Criteria for the distance and 
c x for the leverage are demonstrated. 



Table 1 Summary statistics for surveyed subjects. 



Covariate Hypertension categories 







Normal 


Pre-hypertension 


Stage 1 


Stage 2 


Gender 


Male 


138 


104 


29 


8 




Female 


87 


114 


31 


9 


Age 


Mean 


43.2 


48.8 


54.3 


60.3 




Std. Dev. 


13.7 


13.8 


12.2 


13.4 


BMI 


Mean 


43.2 


48.8 


54.3 


60.3 




Std. Dev. 


13.7 


13.8 


12.2 


13.4 


Sodium intake 


Mean 


3.7 


3.7 


4.6 


2.7 




Std. Dev. 


3.0 


2.4 


5.0 


2.1 



for the odds between the Pre-hypertension and the Normal categories. The scatter plot 
(Fig. 1) between distances and leverages suggests some observations are possible outliers: 
Observations 21, 33, 85, 92, 194, 274, 336, 414, 459, 483, and 489 have large distances, 
which are blue-colored, and Observations 37, 83, 263, 459, 483, 485, and 490 have large 
leverages, which are red-colored. 

The paper is set up as follows. In the next section we presents the basic notations, 
model, and standard GMM. "A robust GMM" introduces the outlier robust GMM 
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A: Odds plot based on MLE estimations 



B: Odds plot based on GMWM estimations 
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Figure 2 Compare odds plots of sodium intakes between MLE estimates and GMWM estimates on the 
population of female, age = 40, and BMI = 23. 



Table 2 Polytomous logistic regression of a hypertension data: coefficient estimates and standard 
errors from GMWM and MLE. 



Variable 


Coefficients 




MLE 






GMWM 




Estimates 


Std. Err 


p value 


Estimates 


Std. Err 


p value 


Sex 


fh\ 


0.7062 


0.2022 


0.0002 


1.3339 


0.2269 


<0.0001 




$31 


0.9789 


0.3235 


0.0012 


1.0368 


0.3013 


0.0003 




flu 


1.4193 


0.5746 


0.0068 


0.6753 


0.2195 


0.0010 


Age 




0.0350 


0.0075 


<0.0001 


0.0671 


0.0086 


<0.0001 






0.0715 


0.0121 


<0.0001 


0.1139 


0.0133 


<0.0001 




Pa 


0.1096 


0.0216 


<0.0001 


0.0753 


0.0103 


< 0.0001 


BMI 


fl23 


0.1147 


0.0316 


0.0001 


0.1681 


0.0360 


<0.0001 




$33 


0.2422 


0.0474 


<0.0001 


0.4382 


0.0538 


<0.0001 




£43 


0.4351 


0.0884 


<0.0001 


0.2279 


0.0388 


<0.0001 


Sodium 


$24 


0.0104 


0.0349 


0.3829 


0.1831 


0.0355 


<0.0001 




fl34 


0.0919 


0.0426 


0.0155 


0.2315 


0.0486 


<0.0001 




$44 


-0.2699 


0.1580 


0.9562 


0.2294 


0.0353 


< 0.0001 



Notes. 

Std. Err, standard error. 



estimator, and gives a detailed exposition of its implementation. In "Results", we compares 
the performance of the standard MLE with the new estimator using a Monte-Carlo 
experiment. And we apply both estimators to real epidemiological data, and illustrate 
the usefulness of the robust estimator for application oriented researchers. We conclude 
with a discussion of advantages and limitations of the approach. The supporting document 
gathers the proofs of the asymptotic property. 
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MATERIALS AND METHODS 
The baseline-category logit model 

Assume a random sample of size n from a large population. Each element in the population 
maybe classified into one of / categories, denoted by y,- = (yn,yi2, ■■■,yij) the multinomial 
trial for subject i, where y« — 1 when the response is in category; and yij — 0 otherwise, 
i — 1, n, j — 1, .. . ,/. Thus, ^j/y = 1. Suppose p explanatory covariates, with at 
least one of them being continuous, are observed. Define x, — ...,Xjp), and 

x = (xi,..,,x n ). We assume that (y;,x;) are independently and identically distributed 
(i.i.d.). Let fty = 7r,-(x,) = P(Yj = j|x,-), denote the probability that the observation of 
Y belongs to category j, given covariates x,-, we assume the relationship between the 
probability ttj and x can be modeled as: 



log 



7I/(X,) 



(1) 



where f}T = {fip,Pj\, ...,f}jp). Here we set the first category as reference class. This 
model is called a baseline-category logit model (Agresti, 2012) or generalized logit model 
(Stokes, Davis &Koch, 2009). MLE is usually used for obtaining parameter estimation of 
this model. Here we present an alternative estimation method formed with the GMM. 

Estimation using GMM 

The baseline-category logit model can be viewed as a multivariate model. Define j* T = 
(y\2, ■ ■ -,yij), since yu is redundant. Let X r = (Xf, . . . ,Xj) is a n(J - 1) x (p + l)(J - 1) 
matrix, withX^, a (/ — I) x (p + !)(/ — 1) matrix, defined as: 



X 



\ 



(2) 



In the GMM framework, we define 
u(p) = X;(y*-jtO, i= !,...,« 



(3) 



where stf = (TtQ,Tt&,...,Ttij). And p r — (fi[,fij, is the (p + - 1) vector of 

unknown parameters. The population moment condition is 

£{u(P)} = 0, 

with the corresponding sample moment condition 



l/„(P)= J>(P). 



(4) 



i=i 
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The GMM estimation of Pm can be obtained by minimizing the following quadratic 
objective function 

Q„(P) = l/J(p)E n - 1 (P)^(P), 

where E„ (P) can be the empirical variance-covariance matrix given by 

s„cp) = \ y> r (PMP) - -[/„(P)[/j(p). 

Or, for the best efficiency of the GMM estimation, we can take the information matrix of 
the polytomous logit model (PLRM), that is, 

n 

£„(P) = I>*(A - )Xf (5) 

1=1 

where D; = diagonal(Jli) . 

In general, Pm can be computed via an iterative procedure (Hansen, Heaton & Yawn, 
1996). Under standard regularity conditions, the GMM estimator Pm exists and converges 
in probability to the true parameter Po (Hansen, 1982). A proof of asymptotic normality of 
GMM can be found on p. 2148 of Newey &McFadden (1994). 

A robust GMM 

In this section we introduce the outlier robust GMM estimator. In the following 
subsection, we specify moment conditions used for robust estimation. And the details 
on the implementation of the estimator follows. 

The generalized method of weighted moments 

The main principle used in the robust GMM estimator is that we replace moment 
conditions by a set of observation weighted moment conditions. Instead of Eq. (3), we 
define 

u w (P) = WiXiiy- -nd-q, i=l,...,n (6) 
where c; — -E{w;X, (y* — Jt;)}. Then the estimation can be based on the moment conditions 
£{u w (P)} - 0. 

Consequently, the generalized method of weighted moments (GMWM) estimates can be 
defined by 

0 W = argminQ^p) (7) 
where 

Q£(P) = [[C(P)] T {Sr(P)} _1 tC(P). (8) 



Wang (2014), PeerJ, DO1 1 0.771 7/peerj.467 ZJ 6/16 



PeerJ 



with 

t^(p)=i> w (p)- w 

i=l 

Here we take the summation as the sample moment condition. The advantage of using the 
summation is that it can lead us to a direct estimation of covariance matrix. 

It is clear to see that this definition is analogous to the standard GMM. If we choose 
W{ = 1 and q = 0 for all observations, the moment conditions in (6) are reduced to 
the standard moment conditions. Therefore, the standard GMM is a special case of the 
GMWM. 

In order to specify the weights for the robust GMM estimator, we need the following 
definition of a distance, which is based on individual moment conditions: 

di(fi) = [«r(P)] T {^(P)}- 1 M f (P), i=l,...,n. (10) 

The weight is assigned based on d,(p), that is, Wd = w(<i ; (P)). There are several alternative 
specifications of weight functions available in the literature (Huber, 1981; Hampel et at, 
2005). In this study, the Huber's weights are applied: 

w(*(B)-ndn(l.^). (11) 

The above specification of weight function requires a value of the tuning constant c&. Both 
the outlier sensitivity and the efficiency of the estimator are determined by the constant. 
On the one hand, the estimator should be reasonably efficient if the sample contains no 
outlier. On the other hand, the estimator should be insensitive to outliers. To determine 
Cd, understanding the distribution of cf,-(P) is critical. Clearly, «™(P) is a column vector, 
and d,-(P) is a scalar quadratic distance, so we set q — Xi _2 (0.975)/n, where xp 2 (:) 1S the 
quantile of the x 2 distribution with p degrees of freedom. 

If we take the information matrix ( 5 ) of the PLRM as S^(P), we can compute leverage 
for each observation: 

H i = XiW(fi)}- l X?at v , i=l,...,n (12) 

where cr ; w is the zth component of E^(P). Then, a Mallows-type weight can be defined 
based on trace(Hi); that is, w x = w(trace(Hi)), to downplay the observations with high 
leverages. Lesaffre & Albert (1989) suggest that the practical rule for isolating leverage 
points might set c x = 2(p + 1)(J — l)/n. In this study, we give observations with large 
leverages 0 weights, 



w x = w(trace(Hj)) 



1 if tmce(Hi)< 

n (13) 



0 otherwise. 

An approach often used to combine the two weights is Wj = Wd-w x (Heritier et al., 2009). 
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The consistency correction vector q is denned as 
= (w(X (1) (P)) -w(df\$)))/diag{^{®j), 



!,...,« 



where w(df } (P)) - w(X;{fe - 7T ! (P)}/diflg[E^(P)] ^ with h = {0, 1}, is the weight for 
fx- 

Implementation of the estimator 

The continuous updating estimation method is applied in this study for estimating the 
regression coefficients and corresponding variance. The procedure is detailed as follows: 

1. Apply an initial value fl®> for computing £ n (P). 

2. Compute d;(P) using Eq. (10) and H; using Eq. (12); assign weights correspondingly 
based on (11) and (13). 

3. With the combined weights, calculate E*(P) and L^(P) in Eq. (9). 

4. Obtain the estimator P„, by minimizing of Eq. (8). 

5. Go back to Step 1, replace p^ with the estimator p£P in computing £™f P^J> and 
move to the next iteration. 

6. Continue this procedure until convergence criteria are met. 

For the starting value a reasonable choice is the MLE estimation based on the original 
data. 

In the appendix, we proved that, under some regularity assumptions, we can have that 
P w is consistent for Po. And by studying the behavior of the weighted moment equations in 
a neighborhood of Po, we showed that the asymptotic linearity ensures the applicability of 
the central limit theorem for the asymptotic normality of GMWM. 

RESULTS 

Monte Carlo simulations 

In this section we investigate the properties of the GMWM estimator using a Monte-Carlo 
study. We generate data with three response categories and two covariates which are from 
multivariate normal distribution with 0 mean and identity covariance. The true coefficient 
matrix Po is 



Po- 



/*21 

V^12 y$22 /?32 / 



/0 1.0 -0.3\ 

0 -0.8 0.7 
\0 -1.0 -0.5/ 



Based on the specified coefficient values and using the probability based on the model (1), 
we compute the category-specific probabilities for each subject. Then, using the computed 
probabilities, we determine the most likely category to which each subject belongs. This 
decision is made through random generation from the multinomial distribution with the 
probability vector as a parameter. For instance, multinomial categories in R-Language 
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Table 3 Bias of parameter estimates and MSE from randomly generated data without outliers. 





Parameter 


True 




MLE 






GMWM 




n 


Bias 


MSE 


Coverage 


Bias 


MSE 


Coverage 


100 




1.0 


0.0666 


0.1030 


0.945 


0.0488 


0.1986 


0.949 




A30 


-0.3 


-0.0059 


0.1206 


0.957 


-0.1440 


0.5578 


0.952 




Pl\ 


-0.8 


-0.0654 


0.1190 


0.938 


-0.0513 


0.2550 


0.961 




Pi\ 


0.7 


0.0566 


0.1892 


0.963 


0.2318 


0.5468 


0.923 




P22 


-1.0 


-0.0853 


0.1764 


0.969 


-0.0691 


0.2380 


0.950 




#32 


-0.5 


-0.0624 


0.1453 


0.945 


0.0203 


0.3195 


0.964 


1000 


P20 


1.0 


0.0050 


0.0087 


0.956 


0.0043 


0.0181 


0.962 




feo 


-0.3 


-0.0055 


0.0105 


0.984 


-0.0106 


0.0333 


0.950 




An 


-0.8 


-0.0039 


0.0099 


0.943 


-0.0013 


0.0251 


0.956 




(kl 


0.7 


0.0081 


0.0160 


0.968 


0.0162 


0.0401 


0.954 




hi 


-1.0 


-0.0071 


0.0145 


0.987 


-0.0025 


0.0258 


0.948 




032 


-0.5 


-0.0047 


0.0122 


0.948 


0.0041 


0.0361 


0.947 



are generated using rmultinorm(ni,Ni,7r(xi)) function, where it (x,-) — (jt\ (x,), . . . , 7r/(x;)) 
is the probability vector, rc; is the number of random vectors to draw, and N{ is the total 
number of objects that are put into /-categories. In our case, H, = JV; = 1 for all subjects 
and J = 3. 

Two sample sizes, 100 and 1000, are examined. For each sample size, we run the 
simulation 1000 times. Average biases and MSEs are calculated and tabulated. Table 3 
shows the results from randomly generated data with no outliers added. When the sample 
size is small, GMWM will give greater biases on /J30 and /631 compared to the MLE method. 
For the sample size 1000, biases on these two parameters increase too, but not so obviously 
Variances will also be inflated due to the weights we applied. 

Outliers are generated from a multivariate normal distribution with the mean vector 
= (2,3) and identity covariance I2. For these outliers, their responses are intentionally 
misclassified, that is, they are placed within a different category from those predicted 
categories based on the true parameters. 

Table 4 lists simulation results with outliers added. For estimations from datasets with 
5% outliers, bias correction from the GMWM is excellent. However, when the datasets have 
10% outliers, biases on estimations of some parameters (P21 and ^22 in this simulation) are 
decreased, but not completely corrected. 

Application 

For the hypertension data, the criterion for identifying observations with large distances 
is cd — 0.22, and the criterion for identifying leverage points is c x = 0.12. Applying the 
GMWM estimator, those blue-colored points in Fig. 1 are automatically downweighted, 
and red-colored points have 0 weight. The GMWM method indeed eliminates those 
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Table 4 Comparison between GMWM and MLE estimation from randomly generated data with outliers added. 



Size Parameter 5% contamination 10% contamination 









GMWM 






MLE 






GMWM 






MLE 








Bias 


MSE 


Coverage 


Bias 


MSE 


Coverage 


Bias 


MSE 


Coverage 


Bias 


MSE 


Coverage 


100 


feo 


0.0568 


0.1102 


0.956 


0.0860 


0.0884 


0.957 


0.0489 


0.0999 


0.971 


0.0868 


0.0819 


0.970 




feo 


-0.0038 


0.1427 


0.954 


-0.0055 


0.1528 


0.949 


-0.0057 


0.1510 


0.945 


-0.0431 


0.1461 


0.814 




P21 


-0.0392 


0.1464 


0.949 


0.2377 


0.1360 


0.785 


0.0319 


0.1227 


0.946 


0.3607 


0.1933 


0.579 




/831 


0.0175 


0.2020 


0.944 


-0.1072 


0.1270 


0.921 


-0.0235 


0.1770 


0.943 


-0.1631 


0.1283 


0.949 




P22 


0.0374 


0.1207 


0.949 


0.3848 


0.2115 


0.578 


0.0207 


0.0968 


0.945 


0.6088 


0.4151 


0.526 




P32 


-0.0548 


0.1572 


0.956 


-0.0964 


0.0904 


0.964 


-0.0817 


0.1349 


0.977 


-0.1069 


0.0803 


0.967 


1000 


P20 


0.0172 


0.0189 


0.939 


0.0490 


0.0102 


0.932 


0.0451 


0.0202 


0.944 


0.0657 


0.0120 


0.900 




fto 


0.0012 


0.0340 


0.945 


0.0124 


0.0075 


0.952 


-0.0071 


0.0336 


0.952 


-0.0111 


0.0063 


0.822 




P21 


0.0260 


0.0242 


0.937 


0.2874 


0.0885 


0.101 


0.0164 


0.0207 


0.936 


0.3876 


0.1545 


0.002 




Pil 


-0.0058 


0.0356 


0.950 


-0.1423 


0.0345 


0.697 


-0.0497 


0.0346 


0.917 


-0.2269 


0.0658 


0.521 




P22 


0.0366 


0.0237 


0.936 


0.4390 


0.2032 


0.000 


0.0238 


0.0182 


0.938 


0.6500 


0.4322 


0.000 




fh2 


-0.0106 


0.0292 


0.951 


-0.0538 


0.0103 


0.940 


-0.0434 


0.0250 


0.953 


-0.0629 


0.0106 


0.902 



inconsistencies: the coefficient of sodium intake for the odds model between the Stage 2 
hypertension and the Normal categories is no longer negative, see the right side of Table 2. 

As the results indicate, age, gender, and BMI all had significant impact on hypertension 
status. For example, one unit increase in BMI resulted in an increase of 1.26 (95% 
confidence interval [1.16-1.35]) times in likelihood to have Stage 2 hypertension when 
compared with the normal status. And with one year age increase, a subject was 1.07 (95% 
CI [1.06-1.10]) times more likely to have Stage 2 hypertension than to stay at the normal 
healthy status. Contrary to the MLE results for sodium intakes, which were difficult to 
make a conclusion due to inconsistent estimate, we now find that sodium intakes were 
statistically significant. When a daily intake of sodium increased one gram, a subject 
were 1.26 (95% CI [1.15-1.37]) times more likely to have Stage 1 hypertension, and 1.25 
(95% CI [1.17-1.35]) times more likely to have Stage 2 hypertension. These results are 
consistent with the findings from previous studies (National Research Council, 2005; He & 
MacGregor,2004). 

DISCUSSION 

A reasonable choice to fit ordinal response data is the proportional odds model if the 
proportional odds assumption is not violated. Proportional odds models can take the 
ordinal information into modeling. And it reduces the number of parameters which 
is needed by the generalized logit model. Unfortunately, our data does not met the 
fundamental assumption of proportional odds models, which makes us choose to treat 
the outcome as a nominal response. 

A datum with a nominal response and some continuous covariates is commonly seen in 
many scientific areas, such as sociology, economy, and biomedical studies. In order to be 
able to deal with outliers, we modified the GMM estimator to replace the standard moment 
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conditions with weighted moment conditions, so that aberrant observations automatically 
receive less weight. We proved that the proposed method has good asymptotic behavior. 
When outliers are present, the GMWM estimator give much smaller biases than the 
estimations derived from the traditional MLE method. This method can be adapted to 
check whether results obtained with the traditional MLE approach are driven only by a few 
outlying observations. The weights produced from the robust procedure can be used to 
diagnose the cause of the differences and to indicate routes for model re-specification. 

APPENDIX: CONSISTENCY AND ASYMPTOTIC 
NORMALITY 

In this appendix, we introduce the assumptions for the asymptotic analysis of GMWM, 
and outline the derivations on the main asymptotic properties of GMWM. 

We make the following sets of regularity assumptions regarding properties of the 
moment functions and identification assumptions. 
Assumption I 

11. B is a compact parametric space. 

12. X is a positive definite matrix. 

13. It holds that £[w w (P)] = 0 if and only if P = p 0 , and for any e > 0,that 

inf ||£[u w (P)]|| > 0 

PeB\AT(p 0 ,e) 

whereAA(p 0 ,e) = {PeM / | ||p -poll < e} is an open € -neighborhood of a point Po. 
Assumption F 

Fl. Let w v "(P) be continuous in P e B, and be twice differentiable in P on AA(Po, e) almost 
surely. 

F2. Expectation E supp eB || u w (P) ||, £ supp eA/ - ( p 0 e) ||3w w (P)/3Pjfc|, and 
E supp e _^(p o f ) I 3 2 u w (P)/3Pj-3P;| exists and are finite for k, I = I,..., p. 

Assumption W 

Wl. lim e ^ 0 sup|| A ||< 6 Kp + A)-w(p)| = 0. 

W2. lim e ^ 0 sup|| A ||< e |3w(p + A)/3p - 3w(p)/3p| - 0. 

When the above assumptions are met, we can prove that p w is consistent for Po. We begin 
with studying the behavior of the weighted moment equations in a neighborhood of Po- 
And proving their asymptotic linearity is followed. The linearity ensures the applicability 
of the central limit theorem for the asymptotic normality of GMWM. 
Theorem 1. Let the assumptions F and I hold, then the GMWM estimator p w is asymptoti- 
cally normal, that is, y/n ^jj w — Po^ — ^> N(0,M T S w M) as n -> oo, where 



M=((n T Evj V w ) r £, 
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S W = E 



-jV(P)u w (P) r 



j=l 



with V W = E 



3P r 



We start with proving two lemmas before we present the proof of Theorem 1. 
Lemma 1. Let the assumptions F, I and W hold, and let !7"(P) be the rth element of the 
vector L/ W (P), r — I,..., p. Then, for 0 < s < 1, 



sup 

l|f||<C 



\ EX> { ( wot/f r (p + - ( 9 /3#) ^(P) } I = o P (i). 



(14) 



Proof. For I, r = 1, . . . ,p, by differentiating the zth component of L/^CP), we get 

= _w(p)x '^pr + ~ipr X(}/ ' - 7ri(p)) - 

Then, 



sup 

llf||<c 



" EE*' I (9 / 9 ^ } ^ (P + 4=) - (3/3A) ^Tr(P) 

-EE" su p \m0i)uZr(v+ J 7=) -WWl)U? r Q) 

n \\t\\<c\ V V"/ 



< 



and 



sup 

l|f||<C 



sf 



(9/3A)UT r P + -p - 0/3A)U£(p) 



< sup 

IMI<c 



3/3^ p+ 



sf 



+ sup 

l|(||<C 



St 



(9/3^- p+— -0/3A)7r t (p) 



|X; W ;(p)| 



+ sup f |o/3fl)wi(p+ 4=) - (3/3A)wi(P) 
IKIKcll V V»/ 



Xi /i-TTi p+ — 



Sf 



+ sup { 1 1 yi - TTi I P + 



-( r; -7r ; (p)) (a/3/J/)w;(P)| 



Then, by taking expectation at both sides, 



£•{ sup 

IMI<c 



St 



(3/3#)Uf f p+ — - (3/3j8,)Uf r (p) 



< sup 

IMI<c 



Wi(p+4=)-Wi(p) sup (3/3^/)7rj(p + 4= 

V V"/ ntii<c\ V v« 
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+ sup 

l|(||<C 



+ sup 

l|f||<C 

+ sup 

|]f||<c 



St 



(d/dPi)m P+^= -(3/3A)jr»(P) 



st 



(d/dfr)wi[ P+ -= - (3/3ft)wi(P) 



sup |X;w;(p)| 

ll*li<c 



£< sup 

\\t\\<c 



xdn-itA P + 



ri-^ p + 



St 



Thus, by conditions F and W, we have 



El sup 

ll|t||<c 

and 



Ei sup 

llf||<c 



(d/dfi)vr r (p+ -^)-v<i/Wi)U"jfr 



sup {d/dfii)Wi$)\. 

\t\\<c 



0, Vi 



1 11 ^ { / st \ 

"EE'* 0/3A)Ur r (P + -= ) - 0/3fl)U? f (P) 

" i=i 1=1 I \ V«/ 



Therefore, we have the results in (14). 

Lemma 2. Let rfze assumptions F, I and W hold, it holds that 



0, Vi. 



sup 

"||t||<C 



[C(p 0 + n"s t) - tC(Po) + V^n" 2 f 



op(l), 



as n -> oo, where V w = E 3 
Proof. Write 



(15) 



C/„ w (Po + n~h) - L/ n w (p 0 ) = ^wKPo + » _ 2f) Mi (p 0 + n -2t) - ^w;(Po) M ,(po). 



(=i 



Bythe Taylor expansion, u,-(Po + n 2t) = w i(Po) + » ^[ip u '(Po + ^) 
Then, we can write 



l^(Po + n-2 0-lC(Po) 

n 

= 2«i(Po) [wi(Po + tt~5f) - Wi(po) 
j=l 



, where 0 < 5 < 1 . 



1 " 9 

+ -p Vwi(po)f^Mi(po) 
— ' dp 



i=l 



1 " 1 l 3 

+ — V Wi(p 0 + n"2t) - Wi(Po) t^Mi(Po) 

V»^ 1 J 3p 



«t=i 



3u; 



3P 



3u ; (Pq) 
3P 



(16) 
(17) 
(18) 

(19) 
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We will now show that terms (16), (18) and (19) are asymptotically negligible. As to the 
term ( 16), By assumption Wl, {w;(Po + n~2t) — w;(Po)} —> 0, and Mj(Po) is independent 
of p. So we have the term (16) tends to zero. Similarly, d/df>Ui($o) is independent of P and t 



9 "-Q°+7s) 



du,(Po) 
3P 



is bounded. Hence, the term (18) tends to zero. Lemma 1 implies 

as n —> oo. So the term (19) can be neglect too. 

Now, let us analyze the term (17). Let w* (Po) be the limit of w; (Po) • Rewrite ( 1 7) as 

1 n 3 1 n 9 

7=y>f(Po)frft«i(Po) - -y=y![wi(Po)-wf(Po)]f^«i(Po) 

— ' op v n ^rf "P 



o, 



(=i 



+ ^E{^(Po)t^KPo)-£ 



i=i 



3p 



8 



w*(Po)t— u ; (P 0 ) 
3p 



(20) 
(21) 
(22) 



The first term (20) is negligible because 3/9Pu;(Po) is independent of p, t is bounded, 
and [w;(Po) — w*(Po)] — * 0. By the central limit theorem, each element of vector (21) 
converges in distribution to a normally distribution random variable with zero mean and 
a finite variance which is uniformly bounded by t. Hence, (2 1 ) is bounded in probability. 
The last term (20) is 



9 P 



This proves the lemma. 



Proof of Theorem 1 . Since t n = yfn — Op ( 1 ) as n — > oo by Lemma 2, we can write ( 1 5 ) as 



LC(Po + n-2t n ) - l/;(Po) + V w n-2t n = ^(fO) 



(23) 



with a probability arbitrarily close to one uniformly in t„ G {t : || t \\ < C}. Moreover, with 
n~h n = o p (l), aL/*(Po + n~h n )/d$ V w in probability as n oo. 
Note that the first order conditions of GMWM equal to 0, that is, 



9Q:(pw) 
ap 



3P 



s(p w )u?(P w ) = o. 



Replace U%($ w ) with U%($ 0 + rTU n ) fromEq. (23), 
S(p w )L/„ w (Po + n-2t n ) 



3(7„ w (Po + n-^„) n '' 



3P 

= [V w + 0p (l)] T S(P w ) [LC(Po) - V^n-J^' 
Then we have 

-i-l 



= 0. 



t„ = ^(Pw-Po) = V^[(0 T £(Pw)V w ] (y w ) T S(P w )!7 n H '(Po) + o p (l). (24) 
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Next we examine the behavior of y/n C/™(Po), which can be written as 

n 

V^(Po) - n-^u,(Po)w;(Po) 

i=\ 

n 

= «-sX)«i(Po){H'j(po)-wf(po)} (25) 
1=1 

+ n-J^«i(Po)w?(Po). (26) 
i=i 

Note that the term (25) is asymptotically negligible in probability due to the triangle in- 
equality and assumption Wl. The term (26) is a stationary sequence of absolutely random 
variables. By assumption 13 and F2, (26) have zero mean and finite second moments. So 
the central limit theorem can be applied on (26), giving y/nU™(fio) ~ N(0, S w ) (Davidson, 
1994, Section 25.3) 

With Eq. (24), we have asymptotic normality of p w , and its asymptotic variance is given 
by M T S W M (Davidson, 1994). a 
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